Assembling draft genomes using contiBAIT
نویسندگان
چکیده
Summary Massively parallel sequencing is now widely used, but data interpretation is only as good as the reference assembly to which it is aligned. While the number of reference assemblies has rapidly expanded, most of these remain at intermediate stages of completion, either as scaffold builds, or as chromosome builds (consisting of correctly ordered, but not necessarily correctly oriented scaffolds separated by gaps). Completion of de novo assemblies remains difficult, as regions that are repetitive or hard to sequence prevent the accumulation of larger scaffolds, and create errors such as misorientations and mislocalizations. Thus, complementary methods for determining the orientation and positioning of fragments are important for finishing assemblies. Strand-seq is a method for determining template strand inheritance in single cells, information that can be used to determine relative genomic distance and orientation between scaffolds, and find errors within them. We present contiBAIT, an R/Bioconductor package which uses Strand-seq data to repair and improve existing assemblies. Availability and Implementation contiBAIT is available on Bioconductor. Source files available from GitHub. Contact [email protected] or [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.
منابع مشابه
Barnacle: An Assembly Algorithm for Clone-based Sequences of Whole Genomes
We propose an assembly algorithm Barnacle for sequences generated by the clone-based approach. We illustrate our approach by assembling the human genome. Our novel method abandons the original physical-mapping-first framework. As we show, Barnacle more effectively resolves conflicts due to repeated sequences which is the main difficulty of the sequence assembly problem. In addition, we are able...
متن کاملAssembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes
Buckwheat (Fagopyrum esculentum Moench; 2n = 2x = 16) is a nutritionally dense annual crop widely grown in temperate zones. To accelerate molecular breeding programmes of this important crop, we generated a draft assembly of the buckwheat genome using short reads obtained by next-generation sequencing (NGS), and constructed the Buckwheat Genome DataBase. After assembling short reads, we determi...
متن کاملOptimization of dengue virus genome assembling using GSFLX 454 pyrosequencing data: evaluation of assembling strategies.
Currently assembling genomes without reference is one of the most important challenges for bioinformaticists all over the world in an attempt to characterize new organisms. The current study has used two dengue virus type 4 (DENV-4) strains recently isolated in Brazil, which have its genomes sequenced using the GSFLX 454 sequencer (Roche, Life Science) by the pyrosequencing method. The GSFLX 45...
متن کاملTough Mining
Caenorhabditis elegans, a 1-mm soil-dwelling roundworm with 959 cells, may be the best-understood multicellular organism on the planet. As the most " pared-down'' animal that shares essential features of human biology—from embryogenesis to aging—C. elegans is a favorite subject for studying how genes control these processes. The way these genes work in worms helps scientists understand how dise...
متن کاملAssembly of polymorphic genomes: algorithms and application to Ciona savignyi.
Whole-genome assembly is now used routinely to obtain high-quality draft sequence for the genomes of species with low levels of polymorphism. However, genome assembly remains extremely challenging for highly polymorphic species. The difficulty arises because two divergent haplotypes are sequenced together, making it difficult to distinguish alleles at the same locus from paralogs at different l...
متن کامل